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We propose and analyse simple deterministic algorithms that can be used to construct machines 
that have primitive learning capabilities. We demonstrate that locally connected networks of these 
machines can be used to perform blind classification on an event-by-event basis, without storing the 
information of the individual events. We also demonstrate that properly designed networks of these 
machines exhibit behavior that is usually only attributed to quantum systems. We present networks 
that simulate quantum interference on an event-by-event basis. In particular we show that by using 
simple geometry and the learning capabilities of the machines it becomes possible to simulate single- 
photon interference in a Mach-Zehnder interferometer. The interference pattern generated by the 
network of deterministic learning machines is in perfect agreement with the quantum theoretical 
result for the single-photon Mach-Zehnder interferometer. To illustrate that networks of these 
machines are indeed capable of simulating quantum interference we simulate, event-by-event, a 
setup involving two chained Mach-Zehnder interferometers. We show that also in this case the 
simulation results agree with quantum theory. 

PACS numbers; 02.70.-c, 03.65.-w 

Keywords: Computer simulation, machine learning, quantum interference, quantum theory 



I. INTRODUCTION 

Computer simulation is widely regarded as complementary to theory and experiment . At present there are only 
a few physical phenomena that cannot be simulated on a computer. One such exception is the double-slit experiment 
with single electrons, as carried out by Tonomura and his co-workers This experiment is carried out in such a 
way that at any given time, only one electron travels from the source to the detector Only after a substantial 
(approximately 50000) amount of electrons have been detected an interference pattern emerges This interference 
pattern is described by quantum theory. We use the term "quantum theory" for the mathematical formalism that 
gives us a set of algorithms to compute the probability for observing a particular event |^ H, 1^- Of course, the 
quantum-mechanics textbook example 0, @| of a double-slit can be simulated on a computer by solving the time- 
dependent Schrodinger equation for a wave packet impinging on the double slit 0, llOj l . Alternatively, in order to 
obtain the observed interference pattern we could simply use random numbers to generate events according to the 
probability distribution that is obtained by solving the time-independent Schrodinger equation. However, that is not 
what we mean when we say that the physical phenomenon cannot be simulated on a computer. The point is that 
it is not known how to simulate, event-by-event, the experimental observation that the interference pattern appears 
only after a considerable number of events have been recorded on the detector. Quantum theory does not describe 
the individual events, e.g. the arrival of a single electron at a particular position on the detection screen 0,0,0,11]. 
Reconciling the mathematical formalism (that does not describe single events) with the experimental fact that each 
observation yields a definite outcome is often referred to as the quantum measurement paradox and is the central. 
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most fundamental problem in the foundation of quantum theory 

If computer simulation is indeed a third methodology to model physical phenomena it should be possible to simulate 
experiments such as the two-slit experiment on an event-by-event basis. In view of the fundamental problem alluded to 
above there is little hope that we can find a simulation algorithm within the framework of quantum theory. However, if 
we think of quantum theory as a set of algorithms to compute probability distributions there is nothing that prevents 
us from stepping outside the framework that quantum theory provides. Therefore we may formulate the physical 
processes in terms of events, messages, and algorithms that process these events and messages, and try to invent 
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algorithms that simulate the physical processes. Obviously, to make progress along this line of thought, it makes 
sense not to tackle the double-slit experiment directly but to simplify the problem while retaining the fundamental 
problem that we aim to solve. 

The main objective of the research reported in this paper is to answer the question: "Can we simulate the single- 
photon beam splitter and Mach-Zehnder interferometer experiments of Grangier et al. [l^ on an event-by-event 
basis?" . These experiments display the same fundamental problem as the single-electron double-slit experiments but 
are significantly easier to describe in terms of algorithms. The main results of our research are that we can give 
an affirmative answer to the above question by using algorithms that have a primitive form of learning capability 
and that the simulation approach that we propose can be used to simulate other quantum systems (including the 
double-slit experiment) as well. 

In Section^we introduce the basic concepts for constructing event-based, deterministic learning machines (DLMs). 
An essential property of these machines is that they process input event after input event and do not store information 
about individual events. A DLM can discover relations between input events (if there are any) and responds by sending 
its acquired knowledge in the form of another event (carrying a message) through one of its output channels. By 
connecting an output channel to the input channel of another DLM we can build networks of DLMs. As the input of 
a network receives an event, the corresponding message is routed through the network while it is being processed and 
eventually a message appears at one of the outputs. At any given time during the processing, there is only one input- 
output connection in the network that is actually carrying a message. The DLMs process the messages in a sequential 
manner and communicate with each other by message passing. There is no other form of communication between 
different DLMs. Although networks of DLMs can be viewed as networks that are capable of unsupervised learning, 
there have very little in common with neural networks The first DLM described in Sectional is equivalent to a 
standard linear adaptive filter |Q| but the DLMs that we actually use for our applications do not fall into this class 
of algorithms. 

In Section UTTl we generalize the ideas of Section Hll and construct a DLM which groups X-dimensional data in two 
classes on an event-by-event basis, i.e., without using memory to store the whole data set. We demonstrate that this 
DLM is capable of detecting time-dependent trends in the data and performs blind classification. This example shows 
that DLMs can be used to solve problems that have no relation to quantum physics. 

In Section IIVI we show how to construct DLM-networks that generate output patterns that are usually thought 
of as being of quantum mechanical origin. We first build a DLM-network that simulates photons passing through a 
polarizer and show that quantum theory describes the output of this deterministic, event-based network. Then we 
describe a DLM-network that simulates a beam splitter and use this network to build a Mach-Zehnder interferometer 
and two chained Mach-Zehnder interferometers. We demonstrate that quantum theory also describes the behavior of 
these networks. 

Quantum theory gives us a recipe to compute the frequency of events but does not predict the order in which the 
events will be observed In genuine experiments the detection of events appears to be random [2, ^3 > ^ sense 

which, as far as we know, has not been studied systematically. In our simulation approach, this apparent randomness 
can be accounted for by a marginal modification of the DLMs, as explained in Section This modification does 
not change the deterministic character of the learning process. It merely randomizes the order in which the DLMs 
activate their output channels. 

A summary and outlook is given in Section IVII 



We consider a machine that has one input and two output channels labeled by ±1 (see Fig. 1). The internal state 
of the machine after processing the n-th input event (n = 0, 1, . . .) is uniquely defined by the real variable x„. At 
the next event n + 1 the machine receives as input a real number yn+i- For simplicity, but without loss of generality, 
we assume that yn+i S !]• The machine responds by sending a message containing yn+i through one of the two 
output channels A„+i = ±1. The machine selects the output channel A„+i = +1 or A„+i = —1 by minimizing the 
cost function C(A„+i) defined by 



II. DETERMINISTIC LEARNING MACHINES 



A. Learning points on the real axis 



C(A„+i) = |y„+i - x„ - 
updates its internal state according to the rule 



(1 - a)An+l\yn+l - Xn 



II 



(1) 



Xn+1 = Xn + (1 - a)A„+i|y„+i - Xn\, 



(2) 
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FIG. 1: Left: Schematic representation of the machine that responds to the input yn+i by passing the input to one of the two 
output channels A„+i = ±1. The value of A„+i depends on the current state of the machine, encoded in the variable Xn, the 
input j/n+i, and the update rule Eq. Q in which a appears as a control parameter. Right: Evolution of the internal variable 
Xn as a function of the number of events n. Solid line: j/n+i = —0.5 for n = 1, . . . , 1000 and y-n+i = 0.5 for n = 1001, . . . , 2000; 
Dashed line: Random sequence of yn+i = ±0.5. 



and sends a message with the input value yn+i on the selected output channel A„^_i. The parameter < a < 1 that 
enters Eqs. Q]) and (O controls the decision process. For simplicity we assume that a is fixed during the operation of 
the machine. 

It is easy to see that A„+i = +1 if Xn < Un+i and A„+i = —1 if Xn > Un+i- Thus, for this particular machine we 
have 



Wn+l - Xn\ 

Hence the update rule Eq. ^ can be written as the familiar recursion 



(3) 



Xn+l = aXn + (1 - a)yn+l- 



(4) 



The solution of Eq. Q reads 



Xn = a"xo + (1 - g) 



(5) 



where xq denotes the initial value of the internal variable. 

As an illustration of how this machine learns, we consider the most simple example where yn+i — y for all ri > 0. 
Then from Eq. © we find that 



a^xo + (1 - a")y. 



(6) 



As < a < 1, we conclude that lim„^oo Xn = y- Thus the machine "learns" the value of the input variable y. From 
Eq. 10} it follows that Xn < y (xn > y) implies Xn+i < y {xn+i > y)- Hence x„ approaches y monotonically (and 
A„ is the same for all n). Therefore, if ?/„ = y, the machine always sends the value of ?/„ through the same output 
channel. 

A distinct feature of this machine is its ability to adapt to changes in the input pattern. We illustrate this important 
property by two examples. Let ?/„ = —0.5 for 1 < n < 1000 and ?/„ = 0.5 for 1000 < n < 2000. During the first 
1000 events the machine will learn —0.5. After 1000 events only 0.5 is being presented as input. Then, the machine 
"forgets" —0.5 and learns 0.5 as shown in the right panel of Fig.^ In this simulation a = 0.99. Alternatively, if y„ is 
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FIG. 2: Left: Diagram of the three-level machine that adaptively classifies the input data j/n+i- Right: Evolution of the internal 
variables a;„ of the machines as a function of the number of events n. The machine number is used to label the corresponding 
line. Top right: First three machines; Bottom right: Third-level machines. 



a random sequence of ±0.5 (each with the same probability) the machine has to learn —0.5 and 0.5 simultaneously. 
Because of this it cannot "forget" and it ends up oscillating around the mean of the input values (zero in this example) 
as illustrated in the right panel of Fig. ^ Let us now assume that our machine has reached this oscillating state. 
All input events j/„ = 0.5 give A„ = +1 and hence the machine sends 0.5 over the +1 channel. A second machine 
attached to this channel only receives 0.5 events and will learn 0.5. This suggests that a network of these machines 
can be used as an adaptive classifier. 

Consider the network of three layers of machines shown in the left panel of Fig. [3 Each machine in the network 
learns the average of the numbers it receives at its input channel and sends the numbers which are smaller (larger or 
equal) than the number it learned to the -1 (-1-1) output channel. In our numerical experiments we set a = 0.99. We 
start with 5000 events of random numbers j/n+i G {—0.75, —0.25,0.25,0.75}, each occurring with equal probability. 
Machine 1 learns the average (zero in this example) and sends the negative (positive) y„+i over the —1 (-1-1) channel to 
the input of machine 2 (3). Machine 2 (3) learns -0.50 (0.50), as shown in the top right panel of Fig. 12 and sends -0.75 
(0.25) over its -1 output channel and -0.25 (0.75) over its -1-1 output channel. Machines 4 to 7 learn -0.75,-0.25,0.25 
and 0.75, respectively, as shown in the bottom right panel of Fig. El Each of these machines forwards the received 
input on its -1-1 (-1) output channel if the initial value of its internal variable is smaller (larger) than the received 
input value. Let us now assume that after 5000 events the input data set changes to Un+i € {—0.75, —0.25, 0.25, 0.50}. 
As can be seen from the right panel of Fig. [21 machines 1, 3 and 7 "forget" the number they learned and replace it by 
-0.0625, 0.375 and 0.50, respectively. All other machines are unaffected because they never get 0.50 as input. After 
another 5000 events we change the set of input values once more, this time to Un+i & {—0.60, —0.75, —0.25, 0.25, 0.50}, 
i.e., we add one element. Now, machine 1 learns -0.17, machine 2 learns -0.53 and the internal state of machine 3 
remains unchanged. Machine 4 can now receive two numbers on its input channel, namely -0.75 and -0.60. As a 
consequence, machine 4 learns -0.675, i.e., the average of the two possible input numbers. Machine 4 puts -0.60 on its 




FIG. 3: Left: Time evolution of the internal variable Xn of the machine defined by Eqs. ||7|l and Q. The input events are 
y = —0.25, a = 0.99, and the initial value xq = 0. For n > 30 the internal variable Xn oscillates about y. For n > 500 the 
sequence of increments (A^+i = +1) and decrements (A„+i = —1) of x„ repeats itself after 8 events (data not shown). Lines 
are guides to the eyes. Right: The number of increments of the internal variable (A„+i — +1) divided by the total number of 
events as a function of the value of the input variable y. Bullets: Each data point is obtained from a simulation of 1000 events 
with a fixed, randomly chosen value of —1 < y < 1, using the last 500 events to count the number of A„+i = +1 events. Solid 
line: {l + y)/2. 



+1 output channel and -0.75 on its -1 output channel. In order for the network to learn all the numbers of the input 
set, we would have to attach one extra machine to each output channel of machine 4. 



B. Learning points on a finite interval 

For the machine defined by Eqs. ^ and |2Jl, formulating the operation of the machine through the minimization 
of the difference between the input and internal variable may seem a little superfluous and indeed, for this particular 
machine it is. However, this formulation is a convenient starting point for defining machines that can perform more 
intricate tasks. For instance, let us make an innocent looking change to the update rule Eq. ^ by writing 

a;„+i = aa;„ + (1 - q;)A„+i, (7) 
and replace the cost function Eq. Q by the corresponding expression 

C(A„+i) = \yn+i - axn - (1 - a)A„+i|. (8) 

For A„+i = +1 we have Xn+i = 1 — a{l — Xn) and for An+i = —1 we have Xn+i = —1 + a(l + Xn)- Therefore, if 
< a < 1 and |a;o| < 1, the internal variable will always be in the range [—1, 1]. At each event the internal variable 
either increases by (1 — q;)(1 — Xn) (if A„+i — +1) or decreases by (1 — a)(l + a;„) (if A„+i = —1). In both cases 
this change is always nonzero, except if x„ — ±1 which can only occur if yn+i = ±1. The ratio of the step sizes is 

(1 - Xn)/{1 + Xn)- 

The machine defined by Eqs. lO and ((HJ behaves differently from the machine defined by Eqs. and To see 
this, it is instructive to consider the case < yn+i = y < 1 for all n > (the case —1 < yn+i = y < can be treated 
in the same manner). For concreteness we assume that —1 < Xq < y. At the first event, minimization of Eq. (jSJ yields 
Ai = +1 and xi — I + a{xQ — 1). In other words, the internal variable x moves towards y. As long as a;„ < y, the 
machine selects A„_|_i = +1, always increasing its internal variable Xn- For some some n > 1 we must have Xn > y- 
Then, making another move in the positive x-direction allows for two different decisions. If the error that results is 
larger than the error that is obtained by moving in the negative direction the machine decides to set A„-|_i = — 1. 
Otherwise it makes another move in the positive a;-direction (A„-|_i = +1). In any case, for some n > 1 the machine 
will select A„+i — —1. Note that when this happens, we must have Xn+i < y and An+2 = +1- This implies that 



after this n-th event (that we denote by uq) the internal variable will oscillate (forever) around the input value y. 
This process is illustrated in Fig. (left). 

For m > uq we have |a;m+i — y\ ^ (1 ~ Q;)max(l — y, 1 + y). Thus, if < 1 — a ^ 1, the amplitude of the 
oscillations is small. The machine "learns" the input value y and the ratio of the increments to decrements is 
(1 + Xm+i)/(l — Xm+i) ~ (1 + y)/(l — y)- In this stationary regime of oscillating behavior, the number of times the 
machine actives the +1 (-1) channel is given by (1 + y)/2 ((1 — y)/2). The simulation results shown in Fig. |21 (right) 
confirm the correctness of this analysis. For a fixed (unknown) value of the input variable, the rate at which the 
machine defined by the rules Eqs. (TJ and ||HJ) activates one of its output channels is determined by the value of its 
internal variable. Therefore, this rate reflects the value that the machine has learned by processing the input events. 
Depending on the application, the message that is sent through the active output channel can contain Xn+i or the 
input value yn+i (there is nothing else that can be send). Obviously we can make the learning process more precise 
by increasing a < 1. Of course, a larger value of a also results in slower learning: In general it will take more events 
for the internal variable to reach the value where it starts to oscillate. 

C. Learning points on a circle 

In going from the first to the second example of Section we changed the update rule such that the variable a;„ 
is constrained to lie in the interval [—1, 1]. We now consider the two-dimensional analogue of the machines described 
in Section III Bl for which the internal vector (a;i_„,X2,n) and input vector {yi^n+i,y2,n+i) represent points on a circle. 
This machine receives as input a sequence of angles (f)n+i defined by 



cost 



2/l,n+l 



yl.n+l + y2,n+l 



■ J, V2.n+1 
Sm(/)„+i = , (9) 

2/l,Ti+l + 2/2,n+l 

and responds by activating one of the two output channels. 
For all n > 0, the update rules are defined by 

xi^n+i = aa;i,n + /30„+i, 

X2,n+i = aa;2,„ + /3(1 - 6„+i), (10) 

where On+i = 0, 1 and < a < 1. In order that the internal vector x„_|_i = (.ti^„+i, a;2,n+i) stays on the unit circle 
we must have 



13 = -a[xi,„e„+i + X2.n{^ ~ e„+i)] ± y'l - a2 + a'^[x\^Qn+i + x\^{\ - e„+i)]. (11) 
Substitution of Eq. ^ in Eq. ^ gives us four different rules: 



X2,n+l 



sy^l + a2(a;|„ - 1) , Xi,n+i = caxi^n if On+i = 0, 



Xi^n+l = S^l + a'^{xl^^- 1) , X2,n+1 = aX2^n if9„+i = l, (12) 

where s = ±1 takes care of the fact that for each choice of Qn+i, the machine has to decide between two quadrants. 
The cost function is defined by 



C = -{Xi,n+iyi,n+l + X2,n+iy2,n+l)- (13) 

Obviously, the cost function Eq. H13|l is nothing but the inner product of the vectors x„+i and y„+i. The new internal 
state itself is determined by calculating the cost Eq. ()13|l for each of the four candidate update rules listed in Eq. H12|l 
and selecting the rule that yields the minimum cost. Note that the minimum of the cost function Eq. I|13(l does not 
depend on the length of the vector of input variables (?/i,n+i, 2/2.n+i)- From Eq. H12|) it follows that if 0„+i = 0. the 
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FIG. 4: Left: Time evolution of the angle representing the internal vector x„ of the machine defined by Eqs. I12II and 1131 . 
The input events are vectors yn+i ~ (cos 30°, sin 30°). The direction of the initial vector xo is chosen at random. In this 
simulation a — 0.99. For n > 60 the ratio of the number of increments (On+i = 0) to decrements {Q„+i ~ 1) is 1/3, which 
is (sin 30°/ cos 30°)^. Data for n < 20 has been omitted to show the oscillating behavior more clearly. Lines are guides to 
the eyes. Right: The number of {On+i ~ 1) events divided by the total number of events as a function of the value of the 
input variable <j). Bullets: Each data point is obtained from a simulation of 1000 events with a fixed, randomly chosen value of 
< < 360°, using the last 500 events to count the number of (0,i+i = 1) events. Solid line: cos^ (f>. 



value of xi^n+i is obtained by rescaling of xi,n and X2,n+i is adjusted such that + = 1. For Qn+i = 1 

we interchange the role of the first and second element of x„+i. 

In general the behavior of the machine defined by rules Eqs. H12|) and IjlSI) is difficult to analyze without the use of 
a computer. However, for a fixed input vector y„+i = y it is clear what the machine will try to do: It will minimize 
the cost Eq. H13|l by rotating its internal vector x„+i to bring it as close as possible to y. However, Xn+i will not 
converge to a limiting value but instead it will keep oscillating about the input value y. An example of a simulation 
is given in Fig. ^ (left). For a fixed input vector yn+i = y the machine reaches a stationary state in which its internal 
vector oscillates about y. In this stationary state the output signal consists of a finite sequence of ones and zeros. 
The machine repeats this sequence over and over again. Obviously, the whole process is deterministic. The details 
of the approach to the stationary state depend on the initial value of the internal vector Xq, but the stationary state 
itself does not. 

These observations are of much more general nature than the example given in Fig. 01 (left) suggests. In fact, as 
the applications discussed below amply illustrate, the stationary-state analysis is a very useful tool to predict the 
behavior of the machines. Assuming that < 1 — a ^ 1 and that we have reached the stationary regime in which 
the internal vector performs small oscillations about (cos 0, sin 0) , a simple calculation shows that 



S(j)o = (/)i,„+i - 01, „ ~ — : — 7 if 6„+i = 0, 

2 smq) 

— 1 sm 

^01 = 01,n+l - 01. n ~ 7, 7 if ©n+l = 1- (14) 

2 cos 

In the stationary regime, we have NqSi^q w A^i(50i where A'o (A^i) is the number of 8„+i — (0„+i = 1) events. 
From Eq. it then follows immediately that iVo/(A^o + A^i) « sin^ and iVi/(iVo + A^i) « cos^ 0. The results of 
this analysis are in excellent agreement with the simulation results shown in Fig. 0] (right). 

The conventional approach to regard the variables 0,i+i as input is fundamentally different from the approach 
adopted in this paper. This can be seen by reformulating the update rules in terms of difference equations and to 
assume that the On+i = 0,1 are independent, uniform random variables with mean 8 — (8„+i). The four rules 
Eq. H12() can be written as 



^\,n+\ = a^a;? „ + (1 - a^)e„+i. 



''1,' 

'-2,n+l 



a'4n + (l-"')(l-©n+l). 



(15) 



Formally Eq. H15() has the same structure as Eq. (0J. Averaging over many realizations of {6„+i — 0, 1} and taking 
the limit n ^ oo we obtain 

(xj) = lim (x? „+i) = e, 

n— i-oo 

{xD = lim (:r|„+i> = l-e. (16) 

n — ^oo 

In other words, a machine that operates according to the rules Eq. H12|) and receives as input the random sequence 
©n+i will (on average) approach a state in which the direction of its internal vector gives us an estimate of the 
= (Qn+i = 0; !)■ In contrast, a machine that minimizes the cost Eq. and updates its internal state according 
to Eq. ifT^ responds on either output channel 6„+i = or output channel 6„+i — 1, with a frequency that is directly 
related to the difference between the current input angle and the angle defined by the internal vector. 

D. Learning points on a A'-dimensional hypersphere 

Consider a sequence of events, characterized by vectors yn+i = (yi,n+i, y2,n+i, ■ ■ • ,2/if,n+i) for n > 0. The vector 
Yn+i is the input for the machine. The internal state of the machine is described by a ii'-dimensional unit vector 
x„ ~ {xi,n, X2,n, ■ ■ ■ , XK,n)- Wc define the 2K candidate update rules {j = 1, . . . , Sj — ±1} by 



Xi,n+1 = Sj.y/l + Q;2(x2,j - 1) if 

Xi^n+1 = axi^n if i ^ i- (17) 

Note that x^x^^ = 1 implies x^_^j^x„^j^ ~ 1 for each of the 2K update rules. The machine responds to the input yn+i 
by selecting from the 2K possible rules in Eq. H17() . the update rule that minimizes the cost 

C^-x^+iYn+i, (18) 

and by sending a message containing y„+i (or, depending on the application, x„+i) on one of its output channels. 
Note that the minimum of the cost function Eq. (|18|l does not depend on the length of the vectors x„+i or y„+i. 
Disregarding the variables Sj that merely serve to determine the sign of Xi^n+i there are K rules. Hence there can be 
as many as K output channels. However, depending on the application, it may be expedient to reduce the number of 
output channels by arranging them in groups. 

E. Communication between events 

The machines analyzed in the previous subsections have one input channel that receives input and two output 
channels, only one of which sends out data (a message) at a particular event. An obvious generalization is to 
construct machines that accept, at a given instance, input from one out of two different sources. This is absolutely 
necessary if we want to build machines in which events can communicate or, in physical terms, interact with each 
other. We now demonstrate that the machines that we introduced above already have the capability to let events 
interact with each other. Therefore we do not need to add a new feature or rule to the machines. 

Consider a machine that has two input channels and 1 and an internal vector x„ with K — A components. At the 
n + 1-th event, either input channel receives the two-component vector y„+i — (yi,n+i, y2,n+i) or input channel 1 
receives the two-component vector yn+i = (j/3,„+i, j/4,„+i). 

In the former case the machine transforms this input into the input vector yn+i — (j/i,n+i, J/2,n+i, a^3,n, a;4^„) of 
four elements by using the current internal vector as a source for the missing elements. Similarly, in the latter case 
the input vector becomes yn+i = (a^i.n, a;2,n, 2/3,n+i, 2/4,n+i)- Then the machine uses yn+i to determine the cost and 
selects the update rule according to the procedure described in Section lll Dl fwith yn+i replacing y„+i). This machine 
learns the two-dimensional vectors yn+i — (2/i,n+i, 2/2,n+i) and y„+i — {y^^n+iiyA,n+i) separately, as if it consists 
of two separate, independent two-dimensional machines, with the additional crucial feature that the internal vector 
represents a point on a 4-dimensional unit sphere. 

It is not difficult to imagine what this machine does in the case that it receives events on only one of the two input 
channels (say 0). Irrespective of the initial value of the internal vector Xq, the machine will always select the update 



rule with j ^ 1,2 (see Eq. (|17|) ') and the two components x^^n and x^^n wiU vanish exponentially fast with increasing n 
(recall that < a < 1). Thus, after a few events the internal state of the machine indicates that the machine receives 
events on only one channel. 

If the machine receives input on both channels (but never simultaneously), Eq. (|17|l implies that the machine 
only scales the two components of the internal state that it uses to provide the missing elements for building the 
input yn+i- Therefore, in the stationary regime, the length of the two-dimensional vector {xi^m X2,n) {{x3^n,X4^n)) is 
proportional to the number of events on input channel (1). Furthermore the number of j — 1,2 (j — 3,4) events 
is approximately equal to the number of events on input channel (1). Although this may seem a very elementary 
form of communication, it is sufficient to construct machines that perform very complicated tasks. 

F. Summary 

The machines described above are simple deterministic machines that make decisions. The machine responds to the 
input event by choosing from all possible alternatives, the internal state that minimizes the error between the input and 
the internal state itself. Then the machine sends a message through one of its output channels. The message contains 
information about the decision the machine took while updating its internal state and, depending on the application, 
also contains other data that the machine can provide. By updating its internal state, the machine "learns" about the 
input it receives and by sending messages through one of its two output channels, it tells its environment about what 
it has learned. In the sequel we will call such a machine a deterministic learning machine (DLM). For a particular 
choice of the update rule (see Section [ll A|l . the machine performs linear estimation but as the other examples of this 
Section amply demonstrate, minor modifications to this rule and/or cost function yield machines that may behave in 
a substantially different manner. 

III. APPLICATION TO BLIND CLASSIFICATION 

The DLM of Section III Al learns about the input data by moving a point on a line. Obviously, this point separates 
two parts of the line. The generalization to iiT-dimensional space is a (K — l)-dimensional hyperplane that divides 
the space into two parts. Thus, to interpret two-dimensional data the DLM should learn a line instead of a point. 
We represent the line by a segment i„ defined by its mid-point x„ and its direction d„. As the DLM receives an 
event yn+i, i.e. a point in a two-dimensional plane, the DLM updates its internal line segment L„ and sends the 
information describing L„ through the -1 (-1-1) channel, depending on whether it lies on the left (right) side of the 
line. The update procedure consists of two steps. First we define two support points Vi and V2 on either side of x„ 
along the direction d„ by 

Vl = x„ - d„/2, 

V2 = x„-fd„/2, (19) 
and we update the two support points according to 

Vl = Vl + (1 - Q!)(y„+i - vi)||y„+i - vill, 

V2 = V2 + (1 - a)(y„+i - V2)||y„+i - V2II, (20) 
where < a < 1 controls the learning process. Then we compute the new mid-point and direction of the line segment: 

x„+i = (vi + V2)/2, 

d„+i = (vi - V2)/||vi - V2II. (21) 

From Eq. 1)20(1 it follows that the support point farthest away from y„+i makes the largest move. Therefore, as new 
input data is received by the DLM, both the mid-point and the direction of the line segment change. Note that the 
update rule Eq. (|20|) is non-linear in the difference between internal and input vector. Although a linear update rule 
also works, our numerical experiments (results not shown) indicate that the non-linear rule Eq. H20() performs much 
better. 

In general x„ will converge to the mean of the input vectors and vi and V2 will be pulled most strongly in the 
direction of largest variance. Therefore L„ will be (approximately) perpendicular to the largest principal component 
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FIG. 5: Snapshots of the input data and resuhs of a DLM-based classifier defined by Eqs. 1)19^ - H21|l (solid line) and a 
conventional principal-component-based classifier (dashed line) [l^ . The data points are random deviates with a normal 
distribution with variance 1/2 and means ± (cos (27rn/ 10000), sin(27rn/10000)). Each panel shows the output of the DLM-based 
classifier after it has processed, point-by-point, the 100 data points shown. The classifier smoothly follows the rotation of 
the means. In contrast to the event-by-event processing of the DLM-based classifier, the principal-component-based classifier 
processes the whole set of 100 data points simultaneously. 



of the covariance matrix of the input data. In other words, the DLM defined above can find the eigenvector that 
corresponds to the largest eigenvalue of the covariance matrix by processing data points in a sequential manner, i.e., 
without actually having to compute the elements of the covariance matrix. 

As an illustration of the capabilities of the DLM introduced in this section, let us consider a classification task in 
which we want to blindly group events into two categories. The input data yn-i-i — iyi,7i+i,y2,n+i) are generated 
through a Gaussian random process described by: 



yi,n = cos{'jn + s)n + ri, 

2/2, n = sin(7n-|-s)7r-f r2, (22) 

where s is a uniform random bit. The random numbers ri and r2 are drawn from the normal distribution A^(0, 1/2). 
In our numerical example we take 7 = 1/5000 and a = 0.99. From Eq. (|22|l it is clear that the input events consist of 
points in a plane that are drawn from one of two (s = 0, 1) Gaussian distributions, the centers of which rotate with a 
period of 10000 events. The mean of all input data is (0, 0) and there is no preferred direction of largest variance. The 
reason of course is that the center of the Gaussian distributions slowly moves on the unit circle. Clearly, this kind of 
classification task can only be performed by permanently updating the estimate of the direction and that is exactly 
what the DLM does. In Fig. [S] we present results of a blind classification experiment that illustrates the operation of 
the DLM defined by the rules Eqs. H19|l - (|21|l . The DLM processes event-by-event, each time updating its estimate 
for the separatrix. For comparison we also show the result obtained by the principal component analysis using as 
input the group of 100 most recent data points processed by the DLM. The differences between both classifiers are 
rather small so that it is clear that the DLM-based classifier performs very well. 

The two-dimensional DLM described above can easily be extended to a DLM that processes fC-dimensional input 
data. Instead of a line segment the DLM has to learn a segment of a (X— l)-dimensional hyperplane. This can be done 
by extending the procedure used in the two-dimensional case. The hyperplane segment is described by a mid-point 
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FIG. 6: Left: Diagram of the DLM network that simulates a polarizer on a deterministic, event-by-event basis. Right: 
Simulation results for the DLM network shown on the left. Each data point represents the number of events in an output 
channel accumulated after 1000 input events. After each set of 1000 events, the orientation (j) of the polarizer is changed 
randomly. Open circles: Normalized intensity in output channel for incoming photons with a polarization angle tp = 25°; 
Solid line: Result (cos^(?/> — 4>)) obtained from quantum theory 0) for incoming photons with a polarization angle ip — 25°; 
Bullets: Normalized intensity in output channel 1 for incoming photons with a random polarization angle tp; Dashed line: 
Result of quantum theory |^ for incoming photons with a random polarization angle i/;. 
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FIG. 7: Left: Schematic representation of an experiment with three polarizers Right: Simulation results for the network 
of DLMs shown on the left. Each data point represents the normalized intensity accumulated over 1000 events. After each set 
of 1000 events, the orientation (j) of the polarizers 2 and 3 is changed randomly. Bullets: Output channel 0; Crosses: Output 
channel 1; Open circles: Output channel 2; Open squares: Output channel 3. Lines represent the results of quantum theory j 



x„ and K — \ orthonormal directions for k — 1, . . . , K ~ 1. We choose K points {v^} on the hyperplane defined 
by {dfc} and x„ such that the distance between each pair of points is one. As new input data yn+i is received by the 
DLM these points are updated according to (the generalization of) Eq. (|2()|l . As in the two-dimensional case, from the 
updated points we can calculate the new mid-point and the new directions. However, unlike in the two-dimensional 
case, these directions do not need to be orthonormal. The orthonormality is then restored by using the (modified) 
Gramm-Schmidt procedure [l6|. 



IV. APPLICATION TO DETERMINISTIC SIMULATION OF QUANTUM INTERFERENCE 



A. Photon polarization 



We demonstrate that the DLM defined by Eqs. (|12(l and and a passive element that performs a plane rotation 
are sufficient to perform a deterministic simulation of the quantum theory |6| of photon polarization. 

We start by recalling some elementary facts about photon polarization 1171 131. Some optically active materials like 
calcite split an incoming beam of light into two spatially separated beams [T? . Il9l | . The light intensity of these beams 
is related to the angle of polarization tj: of the electromagnetic wave, relative to the orientation (p of the material [T^ . 
We disregard all imperfections of real experiments and assume that the experimental data are in exact agreement 
with the wave mechanical theory. Then the intensities /q of beam and /i of beam 1 are given by llTl 



/o = cos2(^-,/,) , /i =sin2(7A-0), (23) 

respectively. If the incident beam has a random polarization, averaging of Eq. (|23|l over all -0 shows that half of the 
light intensity will go to beam and the other half to beam 1. 

If the conventional light source is replaced by a source that emits one photon at a time, the photon leaves the material 
either in the direction of beam or beam 1, never in both [l7| | . Collecting photons over a sufficiently long period shows 
that Eq. (|23l) still gives the number of photons detected in the direction of beam 0(1), divided by the total amount of 
detected photons [l7j . Quantum theory Q describes the polarization in terms of a two-dimensional (complex- valued) 
vector and the action of the material is to rotate this vector by an angle 4> (set by the experimentalist) jla. The 
probability to observe photons in beam (1) is given by the square of the 0-th (1-st) element of the vector [l^. In 
addition, as the photon leaves the material in beam (1), its polarization \s (j) {(j) + 7r/2) Thus the piece of 

material can be used to prepare and also determine the polarization of the photons and is called a "polarizer" [l^. 

Accor ding to quantum theory 0, the polarizer rotates the vector of polarization amplitudes in the following 
manner llSll: 



fb,\ ^ f COS0 sin0\ /aoV ^24) 
\0i J y — sm cos (j) J yai J ^ ' 

Still according to quantum theory 0, the intensity in beam (1) is given by \bo\^ (|6ip). An incident beam with 
an angle of polarization ^ is described by the vector (ao,ai) = (cos 0, sin -0). From Eq. (Pljl we obtain (6o,&i) — 
(cos(0 — 0), sin(0 — 0)) and hence Jq = |6oP = cos^(0 — 0) and Ii = — sin'^{ijj — 0), in agreement with Eq. it^ . 

We now construct a simple deterministic machine that generates events of which the distribution agrees with the 
probability distributions predicted by quantum theory The layout of this "polarizer" is shown in Fig. The 
incoming event (photon) carries an (unknown) angle ipn+i- The purpose of the passive element R{(f>) is to perform a 
rotation 



i?(0) = ~"\^], (25) 

^ ' \ sm0 COS0 I ^ ' 

of the input vector y„+i = (cos ^/^n+i, sin ?/!„+!) by the angle 0. The resulting vector z„_|_i = (cos('0„+i — 0),sin('0„_|_l — 
0)) is sent to the input of a DLM that operates according to Eqs. H12(l and (|13|l . If 0„+i = 0, the DLM responds 
by sending the vector z'n+i = (cos 0, sin 0) through the output channel 0. If 8„+i — 1, the DLM responds by 
sending the vector z'n+i — (cos(0 -I- 7r/2),sin(0 + 7r/2)) through the output channel 1. Clearly this procedure is 
strictly deterministic. We emphasize that the DLM processes information event by event and does not store the data 
contained in each event. 

In Fig. El (right) we show simulation results for the machine depicted in Fig. (left). Each data point represents 
the intensity in beam (1), i.e., the number of = (1) events divided by the total amount of events. The machine 
is initialized once by choosing a random direction of the vector Xq. The angle of rotation is kept fixed for 1000 
events, then a uniform random number is used to select another direction, and this procedure is repeated 100 times. 
In all these numerical experiments we set a = 0.99. Fig. El shows the results for two different numerical experiments: 
In the first set of 100 runs, the direction of polarization of the incoming photons is also determined by means of 
uniform random numbers. In the second set of 100 runs, the direction of polarization of the incoming photons is fixed 
(-0 = 25°). From Fig. (right) it is clear that quantum theory provides a very good description of the input-output 
behavior of the DLM shown in Fig. (left). 




FIG. 8: Left: Diagram of the network of two DLMs that performs a deterministic simulation of a single-photon beam 
splitter (BS) on an event-by-event basis 20]. The solid lines represent the input and output channels of the BS. Dashed lines 
indicate the flow of data within the BS. Right: Simulation results for the beam splitter shown on the left. Input channel 
receives (j/i,n+i, J/2,n+i) = (cos-i/icsini/Jo) with probability po. Input channel 1 receives (y3,„+i, 3/4,„+i) = (cos sin t/ii) with 
probability pi — 1 ~ po- Each data point represents 10000 events. After each set of 10000 events, a uniform random number 
in the range [0,360] is used to choose the angles tpo and ipi. Markers give the simulation results for the normalized intensity 
in output channel as a function of </> = -i/jq — i/'i- Open circles: po ~ 1; Bullets: po = 0.5; Open squares: po = 0.25. Lines 
represent the results of quantum theory 0]. 



As a second illustration we use the same DLM to simulate an experiment with three polarizers described by 
Feynman [l7|. The diagram of this experiment is shown in Fig. |7| A randomly polarized beam of photons passes 
through the first polarizer (without loss of generality we set its angle 0i equal to zero) . Each output channel is used 
as input to another polarizer. Both these polarizers are tilted by the same angle 4>2 — = 4>- According to quantum 
theory the intensity at the output of these four channels is (from top to bottom, see Fig.[7|) cos^ 0, sin^ 0, 
sin^ (p, and 2^^ cos^ 0. The results of our numerical experiments are shown in Fig. [7| The simulation procedure 
is the same as the one used to generate the data of Fig. Also in these numerical experiments we set a — 0.99. We 
emphasize once more that the randomness in these discrete-event simulations only enters through the characterization 
of the photon source and through our procedure of selecting the direction of the polarizer for each set of 1000 events. 
Actually, the latter only serves to counter the possible objection that the apparent quantum mechanical behavior 
would be caused by monotonically changing the direction of the polarizers. As in the previous example, it is clear 
that quantum theory |^ describes the input-output behavior of the three-DLM network very well. 



B. Beam splitter 



We now show that two K — A DLMs and two passive devices that perform a plane rotation by 45° are sufficient to 
build a network that behaves as if it where a single-photon beam splitter. First we describe the network and then we 
demonstrate that it acts as a beam splitter. 

The network shown in FiglHlhas two input channels (0 and 1) and two output channels (0 and 1). The network 
receives events at one of the two input channels. Each input event carries information in the form of a two-dimensional 
unit vector. Either input channel receives (j/i,n+i, y2,n+i) or input channel 1 receives (^3,^+1, 2/4, n+i)- The input is fed 
into the device described in Section fll El The purpose of this front-end DLM is to transform the information contained 
in two-dimensional input vectors (of which only one is present for any given input event), into a four-dimensional 
unit vector. The four-dimensional internal vector of this device is split into two groups of two-dimensional vectors 
{xi,n+i,X4^n+i) and (x3^n+i,X2,n+i) and each of these two-dimensional vectors is rotated by 45°. Put differently, the 
four-dimensional vector is rotated once in the (l,4)-plane about 45° and once in the (3,2) plane about 45°. The order 
of the rotations is irrelevant. The resulting four-dimensional vector is then sent to the input of a second K = A DLM. 

This back-end DLM sends {xi^n+i, X2,n+i) / \/ xf ^^^^ + s^i^n+i through output channel if it used rule j = 1,2 (see 
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FIG. 9: Left: Diagram of a DLM network that simulates a single-photon Mach-Zehnder interferometer on an event-by-event 
basis |20|. The DLM network consist of two BS devices (see Fig. |H| (left)) and two passive devices {R{cj>o) and R{4>i)) that 
perform plane rotations by tpo and^i, respectively. There is a one-to-one correspondence between the elements of a physical 
Mach-Zehnder interferometer and the units in the DLM network. The number of events Ni in channel i = 0, . . . , 3 

corresponds to the probability for finding a photon on the corresponding arm of the interferometer. Right: Simulation results 
for the DLM-network shown on the left. Input channel receives (j/i,n+i, i/2,n+i) = (cos i/)o, sin t/jo) with probability one. A 
uniform random number in the range [0,360] is used to choose the angle ipo- Input channel 1 receives no events. Each data 
point represents 10000 events (A^o + Ni = N2 + N3 = 10000). Initially the rotation angle (j>o = and after each set of 10000 
events, (po is increased by 10° . Markers give the simulation results for the normalized intensities as a function of <^ = 0o ~ . 
Open squares: No/iNo + Ni); Solid squares: ^2/(^2 + A3) for 4>i = 0; Open circles: A2/(A2 + A3) for 4>i = 30°; Bullets: 
A2/(A2 -I- A3) for = 240°; Asterisks: A3/(A2 -f A3) for (pi = 0; Solid triangles: A3/(A2 + A3) for (pi = 300°. Lines represent 
the results of quantum theory 



Eq. 117|l ) to update its internal state. Otherwise it sends {x3^n-^-l, x^^n+i) / y x'^ n+i + ^4 n+i through output channel 
1. 

The operation of the network depicted in FiglHl can be analyzed analytically if we disregard transient effects and 
assume that the information carried by events on channel (1) is given by yn+i = y = {viiVi) [y'n+i = y' = (?/3i 2/4))- 
We denote by p the number of events on input channel divided by the total number of events. Then, the number 
of events on input channel 1 is given by 1 — p. 

In the stationary regime, the internal state (Ji.n+i, !?2,n+i, 2;3_„+i, a;4_„+i) of the front-end DLM (see FiglHJ learns 
(wi,W2,W3,W4) = (jji^yp, y2\/p, yaV^ ~ P, UiV^ ~ P)- Carrying out the two plane rotations of 45° we see that the 
back-end DLM receives as input the four-dimensional vector {wi — W4,W3 + W2,W3 — W2,wi + W4)/^/2. In the 
stationary regime, the internal vector (a;i^„+i, a;2,Tt-i-i, a;3,n-i-i, a;4_„+i) of the back-end DLM oscillates about (wi — 
W4,W3 + W2,W3 — W2, wi +UI4) /V2. Therefore, in the stationary regime and for fixed two-dimensional vectors on input 
channels and 1, the input-output relation of the BS network of Fig. |Slcan be written as 
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Using two complex numbers instead of four real numbers Eq. H26|l can also be written as 

f Wi+ iW2 \ BS^ f W1-W4 + i{w3 + W2) 

\W3+ iW4 J ' ^\W3-W2+ i{wi + W4) 

In quantum theory |a| the presence of photons in the input modes or 1 is represented by the probability amplitudes 
(ao,ai) IT^ I21I l22|. According to quantum theory |a|, the probability amplitudes (5o,6i) of the photons in the 
output modes and 1 of a beam splitter are given by llM lU, 





FIG. 10; Diagram of a DLM network that simulates single-photon propagation through two chained Mach-Zehnder interfer- 
ometers on an event-by-event basis. 
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Identifying oq with wi + iw2 — (j/i + iy2)p and ai with + iw^ — (2/3 + *y4)(l ~ p) it is clear that by construction, 
the DLM network in Fig. |H1 will allow us to simulate a beam splitter, not by calculating the amplitudes Eq. (|28|l but 
by a deterministic event-by-event simulation. 

In Fig. |S1 (right) we present results of discrete-event simulations using the DLM network depicted in Fig. [S] (left). 
Before the simulation starts, the internal vectors of the DLMs are given a random value (on the unit sphere). Each 
data point represents 10000 events. All these simulations were carried out with a = 0.99. For each set of 10000 
events, a uniform random number in the range [0,360] generates two angles ipo and "01- Input channel receives 
(2/i,n-f 1, 2/2,n+i) — (cosV'Oj sinV'o) with probability po • Input channel 1 receives (j/s^n+i, j/4_„+i) = (costAi, sinTAi) with 
probability pi = 1—po- Random processes only enter in the procedure to generate the input data. The DLM network 
processes the events sequentially and deterministically. From Fig. |S1 it is clear that the output of the deterministic 
DLM-based beam splitter reproduces the probability distributions as obtained from quantum theory . 



C. Mach-Zehnder interferometer 

In quantum physics [a, sing le-photon experiments with one beam splitter provide direct evidence for the particle- 
like behavior of photons |4lll2|. The wave mechanical character appears when one performs single-particle interference 
experiments. In this subsection we construct a DLM network that displays the same interference patterns as those 
observed in single-photon Mach-Zehnder interferometer experiments |l2l | . 

The schematic layout of the DLM network is shown in Fig. Not surprisingly, it is exactly the same as that 
of a real Mach-Zehnder interferometer. The BS network described in the previous subsection is used for the beam 
splitters. The phase shift is taken care of by a passive device that performs a plane rotation. Clearly there is a 
one-to-one mapping from each relevant component in the interferometer to a processing unit in the DLM network. 
Recall that the processing units in the DLM network only communicate with each other through the message (photon) 
that propagates through the network. 
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FIG. 11: Absolute value of the difference between the normalized intensity Ni/i^Ni + Nc,) in output channel of the event- 
based DLM simulation and the result of quantum theory for the system of two chained Mach-Zehnder interferometers 
shown in Fig. 1101 1 2(1|. Input channel receives y2,n+i) = (cos t/jq, sin i/)o) with probability po- Input channel 1 receives 

(j/3,n+i , 2/4,ri+i) = (cos 'i/'i , siu -(/ji ) with probability 1 — po- For each event a uniform random number in the range [0,360] 
determines i/jq or Each data point represents a simulation of IQQQQ events (A'^o -\- N\ = N2 + N3 = N4 + N5 = 10000). 
Top-left: Difference as a function of po; Top-right: Difference as a function of tpo ~ tpi; Bottom-left: Difference as a function of 
(po — (pi; Bottom-right: Difference as a function of 4>2 — (pa- 



According to quantum theory the probability amplitudes (6n , 61 ) of the photons in the output modes (A^2) 
and 1 of the Mach-Zehnder interferometer are given by mill IB 113 

Note that in a quantum mechanical setting it is impossible to simultaneously measure {No/{No + Ni), Ni/{No+Ni)) 
and {N2/{Nq + Ni), N^/^Nq + Ni)): Photon detectors operate by absorbing photons. However, in our deterministic, 
event-based simulation there is no such problem. 

In Fig. 1^ we present a small selection of simulation results for the Mach-Zehnder interferometer built from DLMs. 
We assume that input channel receives (?/i^„+i, ?/2,n-i-i) = (cos V'Oi sin?/'o) with probability one and that input 
channel 1 receives no events. This corresponds to (ao,ai) — (cosi/'o + « sini/ioj 0)- We use uniform random numbers 
to determine tpQ. In all these simulations a = 0.99. The data points are the simulation results for the normalized 
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FIG. 12: Same as Fig. [TT] except that a = 0.9999 (instead of a = 0.99) and that the 1000000 events (instead of 10000) per data 
point were processed by the DLM network depicted in Fig. ^1 



intensity Ni/{Na + Ni) for 1=0,2,3 as a function oi (j) = (po — ^i. Lines represent the corresponding results of 
quantum theory [6|. From Flg.|^it is clear that quantum theory provides an excellent description of the deterministic, 
event-based processing by the DLM network. 

The examples presented in Fig. do not rule out that there may be settings for the angles ■00 j 4>Q ^-nd (pi for which 
quantum theory fails to give a good description of the behavior of the DLM network. However extensive series of 
simulations show that this is not the case. Instead of presenting the results of these simulations we will demonstrate 
that quantum theory |^ also describes the stationary-state input-output behavior of more extended DLM networks. 

As an example we consider the DLM network depicted in Fig. ^1 Obviously this network maps exactly onto two 
chained Mach-Zehnder interferometers pol |. Now there are seven parameters po, ^o, tpi, (f>o, 02, and 03 that may 
be varied, so simply plotting selected cases is not the proper procedure to establish that quantum theory describes 
the stationary-state behavior of the DLM network. Therefore we adopt the following strategy. For each set of 10000 
events, we use seven random numbers to fix the parameters po, tpo, V'lj 0Oj 0i, 02, and 03. Then we collect the data 
for these 10000 events and compare the intensity in output channel (A^4) and 1 (-/V5) with the corresponding results 
of quantum theory Q. The latter is given by 



(30) 
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FIG. 13: Simulation results for a Mach-Zehnder interferometer built from SLMs instead of DLMs. Each beam splitter sends 
messages over its output channels and 1 in a random manner. The simulation procedure and annotations are exactly the 
same as in Fig. 



For each choice of {po, ^/'o, "^ij <^2, </'3} we compute the differences \ \bo\'^—N4/{N4+N5) \ and \ \bi\'^—Nii/{N4+Nr,)\. 

N4 (N^) is the number of events in the output channel (1) of the third beam splitter. iVo + A^i = A^2 + ^3 — N4 + 
is the total number of events (10000 in this case). In Fig. we show ||6oP — N^/lN^ + A^s)! as a function of pn , 
V'o ~ V"!) 00 ~ 0ii a-iid (f)2 — (/'a- In all these simulations a = 0.99. Once again it is clear that quantum theory [6j 
provides a very good description of a DLM-based simulation of two chained Mach-Zehnder interferometers. 

D. Technical note 

All simulations that we presented in this section have been performed for a = 0.99. From the description of the 
learning process it is clear that a controls the rate of learning or, equivalently, the rate at which learned information 
can be forgotten. Furthermore it is evident that the difference between a constant input to a DLM and the learned 
value of its internal variable cannot be smaller than 1 — a. In other words, a also limits the precision with which the 
internal variable can represent a sequence of constant input values. On the other hand, the number of events has to 
balance the rate at which the DLM can forget a learned input value. The smaller 1 — a is, the larger the number of 
events has to be for the DLM to adapt to changes in the input data. 

We use the last example of Section llV CA to illustrate the effect of changing a and the total number of events N. In 
Fig.Elwe show the results of repeating the procedure used to obtain the data shown in Fig. llll but instead of a = 0.99 
and N — 10000 events per data point, we used a — 0.9999 and N = 1000000 event per data point. As expected, the 
difference between the simulation data and the results of quantum theory decreases if 1 — a decreases and TV increases 
accordingly. Comparing Fig. with Fig. El it is clear that the decrease of this difference is roughly proportional to 
the inverse of the square root of the number of events. Note that each data point in Fig. is generated without the 
use of random processes. 

V. STOCHASTIC LEARNING MACHINES 

In the stationary regime, the sequence of messages that a DLM (network) generates is strictly deterministic. For 
some applications, e.g. for quantum physics Pi, it may be desirable to randomize these sequences. A marginal 
modification turns a DLM into a stochastic learning machine (SLM). Here the term stochastic does not refer to the 
learning process but to the method that is used to select the output channel that will carry the outgoing message. 



In the stationary regime the components of the internal vector represent the probabiUty ampHtudes. Comparing the 
(sums of) squares of these amphtudes with a uniform random number < r < 1 gives the probabihty for sending the 
message over the corresponding output channel. For instance, in the case of the beam splitter BS (see Fig.|Hl) we replace 
the back-end DLM by a SLM. This SLM will send a message over output channel if + n+i 1^ f- Otherwise 

it will activate output channel 1. Although the learning process of this modified BS network is still deterministic, in 
the stationary regime the output messages are randomly distributed over the two output channels. Of course, the 
distribution of output messages is the same as that of the original DLM-network. 

Replacing DLMs by SLMs in a DLM-network changes the order in which messages are being processes by the network 
but leaves the content of the messages intact. Therefore, in the stationary regime, the distribution of messages over 
the outputs of the SLM-network is essentially the same as that of the original DLM network. 

As an illustration of the use of SLMs, we replace the two back-end DLMs in the Mach-Zehnder interferometer 
network (see Fig. IHI (left)) by their "randomized" version and repeat the procedure that generates the data of Fig. 
(right). The results of these simulations are shown in Fig.^J Not unexpectedly, the randomness in the output channel 
selection is reflected by a (small) increase of the scatter on the data points. In this simulation, the output channels 
and 1 of each beam splitter are activated in a random manner and the functional dependence of Nq/{Nq + iVi), 
A^i/(A^o + Ni), N2/{N2 + N3) = N2/{No + Ni) and N3/{N2 + N3) on (j> is still in full agreement with quantum 
theory In other words, this SLM-network performs a genuine, event-by-event simulation of the ideal (perfect 
detectors, etc.) version of both the single-photon beam splitter and Mach-Zehnder interferometer experiments by 
Grangier et al [l^ . 

VI. DISCUSSION 

We have proposed a new procedure to construct deterministic algorithms that have primitive learning capabilities. 
We have used these algorithms to build deterministic learning machines (DLMs). A DLM learns by processing event 
after event but does not store the data contained in an individual event. Connecting the input of a DLM to the 
output of another DLM yields a locally connected network of DLMs. A DLM within the network locally processes the 
information contained in an event and responds by sending a message that may be used as input for another DLM. 
A distinct feature of a DLM network is that at any given time, only one event (message) is propagating through the 
network. The DLMs process messages in a sequential manner and only communicate with each other by message 
passing. 

We have demonstrated that DLM networks can discover relationships between successive events (see Section ^Oj 
and that certain classes of DLM networks exhibit behavior that is usually only attributed to quantum systems. In 
Sections IIVI and Ivl we have presented DLM networks that simulate quantum interference on an event-by-event basis. 
More specifically, we map each physical part of the real Mach-Zehnder interferometer onto a DLM and the messages 
(phase shifts in this case) are carried by photons. No ingredient other than simple geometry is used to specify the 
update rules of the DLMs. 

As the network processes event after event, the network generates output events that build an interference pattern 
that is described by the quantum theory of the single-photon beam splitter and Mach-Zehnder interferometer. 
To illustrate that DLM networks are indeed capable of simulating quantum interference on an event-by-event basis 
we also simulate an experiment involving three beam splitters (i.e. two chained Mach-Zehnder interferometers) and 
demonstrate that quantum theory 6] also describes the behavior of this network. 

The results presented in Sections|^and0suggest that we may have discovered a systematic procedure to construct 
algorithms that simulate quantum phenomena using deterministic, local, and event-by-event-based processes. We 
emphasize that our approach is not a proposal for another interpretation of quantum mechanics. Our approach is 
not an extension of quantum theory in any sense: The probability distributions of quantum theory appear as the 
result of a deterministic, causal learning process, and not vice versa (see Section llV )| [ll|. O ur results suggest that 
quantum mechanical behavior may originate from an underlying deterministic process |23ll24| . Indeed, it is somewhat 
ironic that in order to mimic the apparent randomness with which events are observed in experiments, we have to 
explicitly randomize the output of the DLMs to mask the underlying deterministic processes (see Section 0. To the 
best of our knowledge, this paper contains the first demonstration that quantum interference can be simulated on an 
event-by-event basis using local, causal, and deterministic processes, and without using concepts such as wave fields 
or particle- wave duality. 

At this point it may be worthwhile to recall what a DLM actually does. In a simple physical picture, a DLM is a 
device (e.g. beam splitter, polarizer) that exchanges information with the particles that pass through it. The DLM 
tries to do this in an effective manner. It learns by comparing the message carried by an event with predictions based 
on the knowledge acquired by the DLM during the processing of previous events. Effectively this comparison amounts 
to a minimization of the squared error (see Section . Schrodinger used exactly the same principle to derive his 



famous equation |25j but called this approach "unverstandlich" in a subsequent publication |26| . 

In a future publication we will show that the approach introduced in this paper can be employed to perform event- 
based simulations of a universal quantum computer 27] . It has been shown that the time evolution of the wave 
function of a quantum system can be simulated on a quantum computer . Therefore it should be possible to 

compute the real-time dynamics of these systems (including the double-slit experiment mentioned in the introduction) 
through discrete-event simulation by constructing appropriate DLM networks. 



Acknowledgement 



We thank S. Miyashita for extensive discussions. 



[1] D.P. Landau and K. Binder, A Guide to Monte Carlo Simulation in Statistical Physics, Cambridge University Press, 
Cambridge, (2000) 

[2] A. Tonomura, The Quantum World Unveiled by Electron Waves, World Scientific, Singapore (1998) 

[3] In this paper we disregard limitations of real experiments such as detector efficiency, imperfection of the source, biprism 
etc. 

[4] D. Home, Conceptual Foundations of Quantum Physics, Plenum Press, New York (1997) 
[5] N.G. Van Kampen, Physica A 153, 97 (1988) 

[6] We make a distinction between quantum theory and quantum physics. We use the term quantum theory when we refer to 
the mathematical formalism, i.e., the postulates of quantum mechanics (with or without the wave function collapse 
postulate) 8| and the rules (algorithms) to compute the wave function. The term quantum physics is used for 
microscopic, experimentally observable phenomena that do not find an explanation within the mathematical framework 
of classical mechanics. 

[7] R.P. Feynman, R.B. Leighton, M. Sands, The Feynman lectures on Physics, Vol. 3, Addison- Wesley, Reading MA, (1996) 
[8] L.E. Ballentine, Quantum Mechanics: A Modem Development, World Scientific, Singapore (2003) 

[9] H. De Raedt, Computer Simulation of Quantum Phenomena in Nano-Scale Devices, Annual Reviews of Computational 

Physics IV, ed. D. Stauffer, World Scientific, 107 (1996) 

[10] A large collection of video's of such simulations can be found at http: / /www. compphys.org / quantummechanics| 

[11] R. Penrose, The Emperor's New Mind, Oxford University Press, Oxford (1990) 

[12] P. Grangier, R. Roger, and A. Aspect, Europhys. Lett. 1, 173 (1986) 

[13] S. Haykin, Neural Networks, Prentice Hall, New Jersey (1999) 

[14] S. Haykin, Adaptive Filter Theory, Prentice Hall, New Jersey (1986) 

[15] K.V. Mardia, J.T. Kent, and J.M. Bibby, Multivariate Analysis, Academic Press, London (1982) 

[16] G.H. Golub and C.F. Van Loan, Matrix Computations, John Hopkins University Press, Baltimore (1996) 

[17] R.P. Feynman, Int. J. Theor. Phys. 21, 467 (1982) 

[18] G. Baym, Lectures on Quantum Mechanics, W.A. Benjamin, Reading MA (1974) 
[19] M. Born and E. Wolf, Principles of Optics, Pergamon, Oxford (1964) 

[20] An interactive program that performs the event-based simulations of a beam splitter, one Mach-Zehnder interferometer, 

and two chained Mach-Zehnder interferometers can be found at http://www.compphys.net/dlm 
[21] J.G. Rarity and P.R. Tapster, Phil. Trans. R. Soc. Lond. A 355, 2267 (1997) 

[22] M. Nielsen and I. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge 
(2000) 

[23] G. 't Hooft, "Determinism beneath Quantum Mechanics", quant-ph/0105105^ 

[24] G. 't Hooft, "Quantum Mechanics and Determinism" , qua nt-ph/0212095 

[25] E. Schrodinger, Ann. Phys. 79, 361 (1926) " " 

[26] E. Schrodinger, Ann. Phys. 79, 491 (1926) 

[27] K. Michielsen, K. De Raedt, and H. De Raedt, in preparation. 

[28] C. Zalka, Proc. R. Soc. Lond. A454, 313 (1998) 



